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Abstract —One of the main concerns in traditional Wireless 
Sensor Networks (WSNs) is energy efficiency. In this work, we 
analyze two techniqnes that can extend network lifetime. The 
first is Ambient Energy Harvesting (EH), i.e., the capability of 
the devices to gather energy from the environment, whereas the 
second is Wireless Energy Transfer (ET), that can be used to 
exchange energy among devices. We study the combination of 
these techniqnes, showing that they can be nsed jointly to improve 
the system performance. We consider a transmitter-receiver pair, 
showing how the ET improvement depends npon the statistics of 
the energy arrivals and the energy consumption of the devices. 
With the aim of maximizing a reward function, e.g., the average 
transmission rate, we find performance upper bounds with and 
without ET, define both online and offline optimization problems, 
and present results based on realistic energy arrivals in indoor 
and ontdoor environments. We show that ET can significantly 
improve the system performance even when a sizable fraction of 
the transmitted energy is wasted and that, in some scenarios, the 
online approach can obtain close to optimal performance. 

Index Terms —energy transfer, energy harvesting, energy co¬ 
operation, transmission policies. 


I. Introduction 

I N the past several years a lot of research has focused 
on Wireless Sensor Networks (WSNs), where one of the 
most important questions is how to prolong the network 
lifetime. In this work we discuss the combination of two 
different techniques: Ambient Energy Harvesting (EH), that 
allows a device to refill its battery gathering energy from the 
environment, and Wireless Energy Transfer (ET), that makes 
it possible to exchange energy among different devices. In 
this paper, we show how ET and EH can be jointly used to 
improve the overall system performance and prolong network 
lifetime. Indeed, in some scenarios, a node may receive much 
more energy and/or consume less energy than some of its 
neighbors. In these cases, it is reasonable to transmit energy 
from the rich energy source to other nodes in order to balance 
the energy levels. ET enables this possibility, and combining 
it with Energy Harvesting is interesting because it allows to 
better exploit the renewable energy source and avoid energy 
overflows. An example of application is the design of energy- 
aware routing algorithms that exploit the possibility of sharing 
energy. 

As a first step to understand the key tradeoffs before 
addressing more complex scenarios, in this paper we consider 
a network composed of two devices (here we focus on a 
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transmitter and a receiver but the model can be readily 
extended to the case of two transmitters) equipped with Energy 
Harvesting and Energy Transfer interfaces. We explicitly take 
into account the effects of finite batteries and, differently 
from most of the related literature, model the devices energy 
consumption with generic functions. We show that, in the 
cases where the scenario is unbalanced, i.e., a device harvests 
much more energy than the other, it is possible to use Wireless 
Energy Transfer to balance the energy levels of the two devices 
and, as a consequence, to achieve higher rewards even when 
a significant fraction of the transmitted energy is wasted. 
We initially find analytical performance upper bounds with 
and without ET. Then, we investigate both online and offline 
approaches and compare them. We present two scenarios with 
realistic irradiation data showing that ET can be used to 
increase the average transmission rate. We also describe the 
effects of finite batteries on the system performance. 

The works most closely related to our paper, which studies 
the combination of EH and ET, are Q-(|^, where Gurakan et 
al. introduced the concept of energy cooperation, unifying 
the study of energy harvesting and energy transfer techniques. 
They considered a system composed of a few nodes and in¬ 
vestigated optimal offline communication schemes. However, 
none of these papers considered the effects of finite batteries. 
Also 0, ® studied the combination of ET and EH with 
infinite batteries and bi-directional energy transfer. In 0 the 
authors also presented the case of two transmitters with finite 
batteries. Differently from our work, these papers focused on 
optimal offline transmission policies and assumed ideal energy 
consumption. A model that considers the circuitry cost was 
recently published in 0, where a transmitter and a receiver 
powered by the same power source with infinite batteries can 
exchange energy. 

Energy Harvesting techniques for WSNs have been widely 
investigated 0 - In fTO), a survey of energy scavenging 
methods was presented. |IH studied the network performance 
when solar cells are used to receive energy, showing how 
the harvested energy changes as a function of the latitude, 
time of the day and season. Analytically, GD formulated 
the problem of maximizing the average value of the reported 
data using a node with a rechargeable battery. Sharma et 
al. studied heuristic delay-minimizing policies and sufficient 
stability conditions for an EHD with a data queue p3|-p3|. 

Energy harvesting receivers were analyzed in |16|-|18|, 
with particular focus on the optimization of the sampling 
strategies. Also, 0 considered a transmitter-receiver pair 
with harvesting capabilities, using energy consumption func¬ 
tions similar to those considered in our work (see Section 0- 
The model that we use in this work is also similar to the 


one proposed in | [20| , pT) for the optimization of an energy 
harvesting system without ET. 

Several different technologies for Energy Transfer have been 
considered so far. In the literature, until recently, the main 
focus was on RF Energy Transfer. This paradigm has been 
studied for several decades (see p2) for a brief history of RE 
energy transmission). In the last years, RE Energy Transfer 
was also considered in WSNs | |^ , | |^ . In most of the 
literature, the authors assume to have a sink (a typical example 
is the Powercaster Transmitter p5| ) that supplies energy to 
several passive sensor nodes (equipped with a Powerharvester 
Receiver, for example). One of the main problems studied 
in this area so far has been the combination of energy and 
information transmission. Indeed, even if it would be theoreti¬ 
cally possible to transmit energy and data simultaneously, this 
is not feasible with current technology p6) . Therefore, two 
techniques were developed: Time Splitting (TS) and Power 
Splitting (PS). In the first case the transferred energy and data 
are sent at different times. In the second case the transmit 
power is split: part of it is used for data and the rest for 
energy. Works such as IHZ) or |j2^ studied the optimal power 
splitting for the PS technique. TS was used in | |29) , where 
transmission policies for a relay in a topology composed of 
three nodes (source, relay and destination) were studied. |24| 
proposed a medium access control mechanism based on ET 
that achieves a high degree of fairness among the devices. 
Recently, studied a network composed of one access point 
that transmits RE energy to several nodes, with the aim to 
design an admission control mechanism. In pT) , the authors 
studied the interleaving problems related to transmitting and 
receiving energy simultaneously, introducing a polling-based 
MAC protocol. |j^ studied the case where some devices 
(energy-rich sources) move through the network and refill the 
batteries of the sensors with RE radiation. In | [3?| and [M] the 
authors introduced an RE-MAC protocol, where nodes request 
energy from some transmitters, and these cooperate by sending 
RE energy to those nodes. 

However, RE Energy Transfer, due to the radiative nature 
of the mechanism, has a very low energy efficienc y ||T5| or 
requires line-of-sight and complex tracking systems phjTEor 
these reasons, other techniques were introduced, e.g., inductive 
coupling, that operates at distances less than a wavelength. 
Clearly, even though this mechanism is very efficient, it 
cannot be used in a WSN because of the very short operating 
distance | [J7| . Another emerging technology is Strongly Cou¬ 
pled Magnetic Resonances (SCMR) Energy Transfer, which 
is a compromise between inductive coupling and RE Energy 
Transfer: it can be used in mid-range applications (order of 
2 — 3 meters) and provides a relatively high efficiency. In p8) , 
it was shown that it is possible to power a 60 W light bulb at 
a distance of 2 m with an efficiency of 40%. The authors also 
extended this work in p9) , showing that SCMR Energy Trans¬ 
fer can be used to power several devices at the same time with 
high efficiency. This is possible because non-radiative wireless 
energy transfer is used, which relies on near-field magnetic 
coupling of conductive loops p5) . In p7) , the authors showed 
that it is possible to achieve the maximum available energy 
transfer efficiency regardless of the orientation of the device, as 


long as the receiver is in the working range of the transmitter. 
The two main problems related to SCMR Energy Transfer are 
that: 1) it is necessary to use coils of large size (order of 20 cm) 
and 2) the transmission range is limited to only a few meters. 
Eor these reasons, it is reasonable to assume that the devices 
are fixed, e.g., two devices in adjacent rooms of a building. 
Even if SCMR ET seems promising, only a few applications 
can be found in the literature so far. p0| considered a vehicle 
that travels inside a WSN, periodically recharging the nodes 
(one at a time) wirelessly, and showed that through periodic 
charges the network may ideally remain operational for an 
unlimited amount of time. The authors extended the study 
to multiple transmissions in HD, and a similar technique 
was also discussed in |[42). Some applications can be found 


in biomedical implants, e.g., |43|, and a wireless charger 
prototype based on SCMR Energy Transfer was proposed 
in ig. 

Our contributions in this paper can be summarized as fol¬ 
lows. Eor a transmitter/receiver pair, we present performance 
upper bounds with and without ET when the energy costs are 
general functions that can include, e.g., the circuitry costs. The 
optimal online and offline policies are introduced and charac¬ 
terized. In particular, we use the offline case as a benchmark 
for our online policies in the finite horizon setting. We show 
that ET can significantly improve the system performance and 
that, in some scenarios, the online policies are close to optimal. 
We also consider the effects of finite batteries, showing that, 
although the reward improvement depends upon the battery 
size, it is not necessary to have very large batteries to obtain 
high gains. 

The paper is organized as follows. Section [I^ defines the 
system model we analyzed, and Section llffl provides the per¬ 
formance upper bounds. In Sections |IV| and|V| we introduce the 
online and offline policies, respectively. Section VI presents 
the numerical evaluation for the online policies. In Section [Vn| 
we analyze two practical examples using realistic irradiation 
data. Einally, Section VIII| concludes the paper. 


II. System Model and Optimization Problems 

We study Energy Harvesting Devices (EHDs) that, in ad¬ 
dition to the capability of gathering energy from the envi¬ 
ronment, are also able to transmit and/or receive energy via 
a Wireless Energy Transfer (ET) mechanism. To characterize 
this technique, we will deal with a pair of EHDs (or simply 
devices or sensors) where one device is the transmitter (TX) 
that sends data to a receiver (RC), whereas RC can send energy 
to TX (we will comment on the extension to bi-directional ET 
in Section [Ill-D[ ). 

We assume a slotted-time system, where slot k corresponds 
to the time interval [k — 1, k), with k a positive integer. Both 
devices are equipped with some interface that can harvest am¬ 
bient energy, e.g., from solar light, indoor light, or vibrations. 
We also assume that the EHDs are temporally synchronized. 

TX transmits data packets toward RC and, in every slot, 
has a new data packet to send. In general, modeling the 
transmitter and receiver energy costs is a difficult task: to 
perform a transmission, in addition to the transmit power, also 










the costs of sensing, pre-processing (coding) and compressing 
the data p3| have to be considered. For packet reception, 
instead, the main contributions are sampling (demodulation, 
hltering, quantization), processing (decoding) and storage ID- 
We simplify the energy consumption models as follows. For 
reliable communications at rate R, TX needs to provide an 
SNR (thus a transmit power) that depends upon R. Similarly, 
also the reception power depends upon R because of sampling 
and processing. By combining these concepts, it is possible to 
establish a relationship between the reception power and the 
transmit power (see ID)- Formally, we describe the energy 
consumptions with two generic continuous, increasing and 
concav^ functions where P is the transmit 

power and = q’^^{0) = 0 (when a device is in 

sleep mode, it is assumed to consume negligible energy). The 
transmit power used in slot fc, S S = [0,/9niax], is decided 
at the beginning of each slot. 

Example 1. For a transmitter, a common model for the energy 
function is & /ID 

q^^{P) = a^^p. (1) 

For the receiver, instead, a reasonable approximation is 
to assume that the energy function is proportional to the 
transmission rate: 

q’^^P)=a'^Hn{l+AP). (2) 

This model is a good approximation when the circuitry 
costs are negligible. Note that in the low-SNR regime, we can 
approximate q'^'^f) as « a’^^P. 

^tx, Q,rc proper constants and A is an SNR scaling 
factor. 

The contributions of the circuitry costs can be included in 
this model by adding to Q and fwo terms C‘^{P) and 
('’’‘^(P) that, starting from 0, increase quickly until constant 
values in order to preserve the continuity and concavity of 
q^^{P) and q“''^(P). Note that, in the general case, our model 
allows the circuitry costs for TX and RC to be different. 

The amount of energy to be sent with the Energy Transfer 
mechanism, Dj. > 0, is decided in every slot. The energy 
received in slot k can be exploited only in a later slot. In 
our work we focus on uni-directional energy transfer from 
RC to TX. We will discuss in Section IIII-DI how to extend 
this hypothesis to the bi-directional case. We assume that 
only a fraction /3 of the transmitted energy is received, where 
/3 G [0,1] is the energy transfer efficiency. Note that the 40% 
efficiency claimed in p8| for a distance of 2 m is only referred 
to the transmission itself. Indeed, the effective wall-to-load 
efficiency (ratio between the power extracted from the wall 
power outlet and the received power) was 15% and, for this 
reason, in this work we will use a transfer efficiency /3 = 0.15 
as a baseline. 

The devices have hnite batteries that can store at most 
and PJjJax Joule of energy. The randomness of the energy 
arrivals is described through two independent processes {B^} 

*In this paper, the term “concave” will be used to designate concave 
downward functions, e.g., functions with non-positive second derivative. 


and {BfS} with some statistics, e.g., deterministic, Bernoulli or 
truncated geometric. The energy arrival processes have means 
> 0 and > 0 and the energy harvested in a slot can be 
exploited only in a later slot. 

With the introduced quantities, the evolutions of the two 
batteries can be described as: 

Pf+i = min{Pf - q^^iPk) + ffDk + Bf, (3) 

P^^+i=min{P--g-(Pfc)-Pfc + Pr, (4) 

where E^, E]f are the energy levels in slot k. Since we 
consider slots of fixed length, in this work we refer to power 
or energy interchangeably. 

With hnite batteries, energy outage (empty battery) and 
energy overflow (new energy arrivals that cannot be stored due 
to a fully charged battery) degrade the system performance 
and have to be considered in the design of a transmission 
policy p6) . We assume that the state of the system Sk = 
{E^, E'ff) is known to both devices at the beginning of every 
slot, thereby obtaining an upper bound to the achievable sys¬ 
tem performance, and leave the case of imperfect knowledge 
for future study. 

A. Optimization Problems 

A policy pL dehnes which action to perform in every 
slot k, i.e., how much energy should be used to transmit 
data (P = {Pi,P 2 ,...}) and how much energy should be 
transferred (D = {Pi, P 2 , • • -DI^ 

In this work we consider as metrics the average uncon¬ 
strained rewards in K slots and in the long-term, dehned as 

(5) 

G^4liminfG;f, (6) 

K—¥0Q 

where g{x) is a non-decreasing and concave function of x. As 
a baseline, we focus on the average normalized transmission 
rate, obtained when g{x) = ln(l + Ax), where A is an SNR 
scaling factor. 

Optimizing the long-term average reward is a typical as¬ 
sumption because sensors are generally expected to operate 
for long times in the same scenario. However, our model 
can be adapted to different reward functions. For example, 
if the discounted long-term reward were considered, then the 
optimization techniques would remain the same. 

We consider the following optimization problem 

g* = arg max G^, (7) 

d 

subject to appropriate feasibility constraints {i.e., the transmis¬ 
sion power and the transferred energy must be non-negative 
and must not exceed the energy available in the batteries). 
Since the optimization variables and the specihc constraints 
depend upon the chosen approach, this problem will be 
discussed in more detail in Sections |IV] and |V] 

^The specific structure of fi depends upon the considered scenario and will 
be discussed in more detail in Sections |1V| and [V] 




B. Optimization Approaches 

In the previous sections we set up the system model and an 
optimization problem. The goal of this paper is to solve such 
problem. More precisely, we proceed as follows. Initially, we 
introduce some performance upper bounds, i.e., upper bounds 
to G^. These do not depend upon the optimization technique. 
Then, we discuss Q with two approaches (Sections |IV|and[V]l. 

1) Online approach: In this case, in every time slot k, the 
policy chooses an action that depends upon the current state 
of the system and upon the energy arrival statistics. In the 
online case, the output of the optimization process is a set of 
rules (one for every state of the system) that, given Sfc, can 
be applied to choose the action to perform. In order to model 
the system as a Markov Decision Process, in Section IV we 


approximate the continuous model with a discrete one. 

2) Offline approach: In this case, the policies are found 
by exploiting the non-causal knowledge of the energy arrivals 
(not only the statistics). In the offline case, the output of the 
optimization process is a pair of sequences (P, D) that define 
in every slot from 1 to K which action to use. 

The main focus of our work is on online policies which, 
though performing worse than offline policies in general, 
have the important advantage of not requiring non-causal 
knowledge of the energy arrivals, and are therefore practically 
usable. Offline policies will be used in Section [Vn] as a bench¬ 
mark, showing that in some relevant cases the performance 
loss incurred by the online approach can be quite small. 


III. Upper Bounds 

In this section we introduce upper bounds to for the 
cases with and without ET. This is an interesting problem 
because the presented upper bounds are closely approached in 
several cases of interest and provide an easy characterization 
of the system reward without performing any optimization. 

They are derived in the infinite horizon case, but can be 
simply reformulated in the finite horizon case by changing the 
long-term means and 6''^ with the means in K slotslj In 
particular, we will generalize the following intuitive results. 
As an example, consider (?‘(P) = P (in the following, 
i G {tx, rc}) and (RC harvests more energy than 

TX). An upper bound to G^ without ET is given by g(P^) 
and is achievable if the devices consume in every slot (except 
possibly for a vanishing fraction of them) an amount equal to 
the average harvested energy. This can happen if the battery 
sizes are infinite or if the batteries are finite and the energy 
arrivals are deterministic. Moreover, since > P^, it may 
be interesting to use ET to improve the performance (we 
recall that RC can send energy to TX). In this case an upper 
bound is given by a balanced combination of the transmitter 
and receiver average energy arrivals: g{{f]P'^ + P^)/{/3 + 1)). 
Note that in this last expression both P^ and P'^ contribute to 
increasing the upper bound. Also, we remark that the transfer 

^In the following, we find upper bounds based on the means of the 
harvesting processes. Thus, even if we do not explicitly take into account 
the specific random behavior of the energy arrivals, we are still considering 
the fact that energy is gathered over time, which is a fundamental feature of 
EH. 


efficiency needs to be considered. These considerations are 
formalized in the general case in the following (note that, 
unlike in the above example, we do not impose any constraints 
on P^ and P^). 

A. Upper Bound without ET 

We first focus on the case without ET. We have the 
following result. 

Theorem 1 (Upper Bound without ET). If there exist two 
continuous and increasing functions such that 

1 ) 0 < < q^^{P) and 0 < T'’'‘=(P) < 

p^{P), yP G S, and 

2 ) 5 ( 4 '*’' (■)) and (•)) are concave functions, 

then an upper bound for the reward is 

G™^^ =5 (min{4'‘’'-\6*’'),vl/--^(6-)}) . (8) 

If only exists, i G |tx, rcj, then an upper bound is 

Glf:^ =9 

If neither 'l'*’'(P) nor Al'^^{P) exists, then the optimal 
reward is infinite. 

Proof: See Appendix ^ ■ 

Note that, in the previous theorem, we convert a power 
consumption P to a reward in two steps. Eirst, we apply the 
inverse function 4" to convert the power consumption into 
a transmission power. Then, we apply the function g{-) to 
the transmission power in order to obtain the corresponding 
reward. 

In practice, 4'‘(-) is an optimistic auxiliary energy con¬ 
sumption function that makes it possible to mathematically 
obtain •HI- Intuitively, the closer 4'‘(-) and g'(-), the tighter 
the upper bound. 

Remark If 'E. is bounded, i.e., Pmax < oo, then the 
conditions of Theorem only need to be satisfied for a finite 
range of P, and therefore it is always possible to find 4^‘(-)- 

Note that a particular case of the previous remark is obtained 
when the battery sizes are finite. In this case Pmax is bounded 
by the maximum battery size (in particular (/‘(pmax) < ^max)- 

As shown in the following corollaries, there exist cases in 
which the upper bound of Theorem can be achieved. 

Corollary 1. //g*’'(-) = 4'*’'(-) and (P^) 

(TX is the bottleneck) then, in the deterministic energy arrivals 
casef] the upper bound is achievable with finite batteries 
^max — -^max — ^ (P^))- Optimal poUcy is 

{ q^^~\P^), ifP^<Efand 

^rc(^tx-'(5tx)) < (9) 

0 , otherwise. 

The same holds if the roles of TX and RC are exchanged. 
Proof: Let v = 5 *’' (&*’'). 

Assume that at the beginning = E^f = 0. The batteries 
evolution is the following: Elf^ = P^, Elf = P^. Note that 

‘'Note that, since we consider i.i.d. energy arrivals, deterministic is equiv¬ 
alent to constant. 





q^^{v) = by definition and q^‘^{v) < by hypothesis. At 
k = 3, we have: = 26*’" - g*’"(-y) = 6 *”" (transmit with 

power V and then harvest an amount of energy exactly equal 
to 6 *’'j and = 26’’"’ — q'^‘^{v) > Thus, in every slot, 
excluding an initial transient, TX can transmit data with power 
V and RC is always able to receive them, thus the reward 
per slot is 5 (f). In the long-term, the upper bound in I®. is 
achieved. With different initial states the reasoning is the same. 

Note that in the previous considerations we implicitly used 
the hypotheses ii’^ax ^ -^max — 

necessary to obtain the thesis. ■ 

The policy of Equation (|^, possibly excluding an initial 
transient, consumes all the energy that is received in every 
slot, and thus achieves the upper bound p( 6 *’"). 

When the battery sizes are infinite. Corollary [T] generalizes 
to any energy arrival process. 

Corollary 2. //?*’"(•) = g*’"”'( 6 *’") < 

(TX is the bottleneck) and the battery sizes are infinite then 
the upper bound is achievable for any statistics of the 
energy arrivals. The same holds if the roles ofTX and RC are 
exchanged. 

A formal proof of Corollary is given in p7) for the 
special case of a linear energy consumption model in a single 
EHD, but can be extended to our case. To show that a reward 
arbitrarily close to the upper bound can be achieved, a Save- 
and-Transmit Scheme was introduced, where the device does 
not transmit in an initial transient in order to accumulate 
enough energy to absorb energy fluctuations, so as to avoid 
energy outage and manage to consume and receive, on average, 
the same energy. 

B. Upper Bound with ET 

We now derive similar results for the case where ET is 
considered. We introduce two new functions c*’"(-) and ?"’(•) 
defined as follows: 

c*’"(e) = 6‘’" + /36’'"(l-0, (10) 

( 11 ) 

where f G [ 0 , 1 ] is a constant that represents the average 
fraction of the harvested energy that is transferred with ET 
under a policy p,. c’(^) represents the average amount of 
energy that can be exploited at device i G {tx, rc} to transmit 
or receive. In particular, RC transfers part of the harvested 
energy, thus the residual energy that it can exploit is, on 
average, only a fraction f of the harvested one ( 6 ’^"’). TX, in 
addition to its own harvested energy ( 6 *’"), receives the energy 
that RC transferred (scaled by the energy transfer efficiency 
/3). One of the key results of the paper is stated in the following 
theorem. 

Theorem 2 (Upper Bound with ET). Under the hypotheses 
of Theorem when ET is used, an upper bound for is 

( 12 ) 


. i/'T'*'"’ '(?'’(!)) < T-*” '(c*’"(l)), then f* = 1 ; 

• otherwise, is such that T'*’" = 

Proof: From Theorem ® an upper bound is given using 
6 *”" and 6 ’’"’ inside the min operation. When ET is used, the 
average amounts of incoming energy at TX and RC are c*’"(^) 
and respectively. Thus, when f is fixed, an upper bound 

is 

= 9 (min{vI/‘’"-\c‘’"(0),'I^’^^''(c’'^(0)}) • (13) 

In practice, we replaced 6 *’" and with c*’"(^) and 
because, with ET, the energy that the devices can exploit is 
described by c*’"(^) and ?‘’(^) (see the description of 

Note that T'" (•) is an increasing and continuous func¬ 

tion because '!''(■) is increasing and continuous. Moreover, 
/ df < 0 and > 0. Thus, the first argument 

of the minimum in ([T3) is decreasing in f, whereas the second 
one is increasing. The minimum of the two is maximized 
when they are equal, if this is possible, or otherwise for 
the maximum value of f, i.e., = 1. Note that, since 

^jytx ^ ^rc equal to One if and 

only if at f = I we have \c“'‘’(l)) < T'*’" \c*’"(l)), i.e., 
'ftx (c*’"(^)) and 'k''"’ (?'^( 0 ) have an intersection 

point in [0,1). ■ 


Corollaries [T] and can be generalized as follows. 

Corollary 3. If q^^{-) = 'k*’"(-) and q^'^f) = *k'’‘’(-) then, in 
the deterministic energy arrivals case, the upper bound 
is achievable with finite batteries E^^^ > (c‘^'^(C*)))> 

^max — optimal policy is 


Pk 


Dk 


rg--\c-(e)), 

[o, 


< El^ and 

^tx(^rc-’(^rc(^*))) < ^tx^ ( 14 ) 

otherwise, 


U^-q’^%Pk), ifElf>U^ 

0 , otherwise. 


(15) 


Proof: The proof is similar to that of Corollary Let 

V = At k = 2, Ef = 6 *=" and Ef = EG 

If 6 *’" is greater than or equal to ( 7 *’"(u), then the policy 
chooses P 2 = V because < 6 “''’ by definition ofU^^f) 

and D 2 = 6 ’"’ — q'^^{v) because Elf^ > P^. Note that the sum 
q^'^{P 2 ) + D 2 is equal to P'^, thus, at k = i, E(f = P'^. 
Instead, for TX, Ef = 6 *” - + 6 *’" + I3(P^ - q^fv)) = 

P^ — q^^{v) + c*’"(^*). If f* < 1, then Ef = P^ because 

V = g*’" ^(c*’"(C*))> otherwise Ef > P^ (see Theorem^. 

If instead 6 *’" < q^^(y), the policy chooses P 2 = 0 and 
D 2 = P^. Note that, if f = 1, we have q’^^ {Pf < 
gtx (p^'^ Qfid the inequality chain becomes ( 7 “'"’ ( 6 '^"’) < 
gtx ( 6 *’") < q’^^ {Pf, which is not possible. Thus f* must 
be less than 1 and g*’" = g’’"’ ^(?'’(^*)) implies 

that g‘’"(w) = c*’"(0 > P^. At k = 3, Ef = 26*’" + jSP^^ > 
c*’"(^*) = g*’"(u) and Ef — P'^. For k > 3, TX always has 
enough energy to transmit with power v. 

The previous considerations hold if the battery sizes satisfy 
the hypotheses of the theorem. Thus, after an initial transient. 


where 


the devices always have enough energy to transmit and receive 
with power v and in the long term the upper bound •El is 
achieved. ■ 

Corollary 4. If and the bat¬ 

tery sizes are infinite, then the upper bound is achievable 
for any statistics of the energy arrivals. 

Proof: See Corollary^ ■ 

The following result establishes when it is beneficial to use 
ET. 

Proposition 1. //= «'*’'(•) and = «'"''=(•), ET 

always improves the upper bound fi.e., if 

and only if f* < 1. 

Proof: ET improves the performance if 

< GET ^ < 

g (■?*)))• Since gf) is an increasing 

function, the previous condition is equivalent to 

• (if) ^* < 1 means that (see Theorem^ q^^ > 

gtx (if^), thus the condition becomes q^^ (ff^) < 

qTc Thanks to Theorem and to ®-([TT), 

and since q^^ (•) is increasing, if < 1, then 

q'^^~\c{C)) = + /36"'"(1 - r)) > q"^~\b^^); 

• (only if) Assume = 1. In this case q’^^ (6’’'^) < 

gtx ^ (if^), which implies G“°et _ g(q’^^ ^ (b’^‘^)) and 
G^t = g{q'^'^ (b’^^))> thus ET does not improve the 

performance upper bound. ■ 

Note that when q^^f) = and q^’^{-^ = 

^* < 1 is equivalent to q^^ ^ (b*^^) < q’^^ (b’^^)- Thus, 

independently of the transfer efficiency (3, if the average 
amount of energy harvested per slot at RC corresponds 
to a transmission power (q’^^ (b'^'^)) greater than what is 
used at TX (g*’^ (&*’')), then the use of ET results in an 
increased upper bound. When = 1, ET cannot provide 
any improvement because RC is the energy bottleneck and 
therefore is unable to cooperate with TX. Also, note that the 
previous considerations also apply to the actual performance 
for the deterministic energy arrival case (in which the upper 
bounds can be achieved). 

According to the above results, we can identify three main 
reasons why the upper bounds may not be achieved: 1) The 
functions g*(-) and ^'‘(•) do not coincide. In this case, the 
only chance to obtain a better upper bound is to redefine 
'&'(•), if possible. In the following examples we show how 
to derive ^'‘(•) in several cases of interest. 2) The batteries 
are small (see Corollaries and |^. As the battery sizes grow, 
the performance gets closer to the upper bounds. 3) The time 
horizon is finite. Indeed, the save and transmit scheme of 
Corollary can be applied only if an infinite number of slots 
are available. 


C. Examples 

Example 2. Consider the low-SNR regime (see Example^. 
In this case the energy consumptions of both the transmitter 
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Figure 1: q{P), ^(P), their inverse functions, and g{q ^(P)), g{'!/ ^(P)) 
of Example as a function of P. 


and the receiver are linear in P. The functions '!''(■) can then 
be taken equal to g'(-) and the upper bounds are 


/^noET _ ^ 

'-"u.b. — 9 


^min 


= min 


1 , 


l^oTx’ CT^J ) ’ 

^rc plrc 

5rc ^tx 



is a linear combination of the average energy arrivals and 
is used to balance and 


Example 3. Another interesting case is q^^ = a^^P, q^^ = 
ln(l + AP) (Equation and g(x) = ln(l + AP). Note 
that g(-) and q'^^{-) are proportional and g{q'^'^ (x)) = xjcP‘^ 

is concave. Also in this example the functions 'I'‘(-) can be 
taken equal to g‘(-). The upper bounds become 


/^noET 

^u.b. 


= min 




r^ET _ 
^u.b. — 


c-(e) 


a' 


where is the unique solution of 

fetx + /3yc(l _ g) _ 1 

(T*x \ 


if AP^/a^^ < — 1, and C* = 1 otherwise. 


Example 4. We now want to show a case where and q'^(-) 
are not the same. Consider g(x) = ln(l + Ax), b = = P‘^ 

and q{-) = with 


qiP) 


‘P^P, ifP<Pn, 

C + -P; if P ^ Pn, 


(16) 


with Pn arbitrarily close to 0. Note that this energy consump¬ 
tion model is suitable for the cases where the circuitry costs 
are not negligible. If we choose '^(P) — q{P), then it can be 
verified that there exist values of C, and b such that g{q~^{-)) 
is not concave. In this case g{q~^ib)) is not guaranteed to be 
an upper bound. 

However, an upper bound can be found by considering a 
function '^(P) defined as in Theorem^ In Figure^we plot 
q{P), 'I'(P) cind their inverse functions when A = 10, C = 6, 



















E-axax = 11- For the purpose of illustration, we arbitrarily set 
Pn = 1. Note that g{q~^{P)) is piece-wise concave whereas 
^(v[/“i(p)) is always concave. The function '^{P) is such 
that g{'$~^{P)) is divided in three regions. The two external 
regions are equal to two concave portions of g{q~^{P)). The 
central region is designed to be concave where g{q~^(P)) 
is not, and is obtained considering the straight line that is 
tangent to g{q~^{P)) in two points without intersecting it. In 
Section we show that the upper bound given by this choice 
o/'P(P) is close to the real performance. 


D. Extension: Bi-directional Energy Transfer 

In the following, we present how our model can be extended 
to the bi-directional ET case. In this context, also TX is able 
to send part of its stored energy to RC when appropriate. In 
slot k, TX sends an amount of energy to RC, whereas 

RC sends to TX. The hrst term inside the minimum 

of Equation ([^ has to be changed to 

E]f - q^^{Pk) + + Bf (17) 

and similarly for Equation Q by switching “tx” and “rc”. 

The optimization of Equation Q in this case provides three 
quantities, i.e., the transmission power, the energy sent from 
RC to TX and vice-versa0 

The upper bound of Equation ([^ does not change because 
it does not depend upon ET. Theorem can be reformulated 
by changing Equations ([T0li-([TT]i as follows 

-tx^^tx—j-rc ^rc—j-tx^ ^tx^tx—j-rc _|_ ^rc—)-tx^rc ^rc—)-tx^ 

db 

-rc^^tx—>^rc ^rc—j-tx^ ^rc^rc—>^tx _j_ ^tx—j-rc^tx^^ ^tx—j-rc^ 

(19) 

where represents the average fraction of the harvested 
energy that is sent from device i to device j. 

In our work we decided to focus on the uni-directional case 
and outline in this section how to extend it for presentation 
simplicity. Moreover, uni-directional ET can be effectively 
used in the practically relevant cases where one device harvests 
more energy than the other. Einally, uni-directional ET can be 
seen as a simpler lower bound for the bi-directional case. 


IV. Online Optimization 


We now discuss the online approach and focus on long¬ 


term optimization. According to Section II-B the aim of an 


online policy is to dehne a set of rules that, given the state 
of the system in a slot, specifies which action (transmission 
power and transferred energy) should be used in that slot. 
The online approach is interesting because it requires only a 
statistical knowledge of the energy arrival process, thus may 
be effectively used in practice. 

In order to formulate the problem as a discrete Markov 
Decision Process (for which there exist efficient solving al¬ 
gorithms), we introduce the notion of energy quanta, i.e., we 


discretize the amounts of energy (energy arrivals, energy con¬ 
sumptions, energy stored, energy exchanged)]^ The batteries 
have integer sizes and can be considered as buffers. 

In order to obtain a consistent formulation, the values of 
and are chosen such that = ^^max/emax- 

Under this assumption, one energy quantum corresponds to 
^max/Cmax Joule. Therefore, when we deal with the online 
model, all the energy values (D^, qfPt), b, etc.) are expressed 
as a (not necessarily integer) number of energy quanta. 

With the above formulation, we will model the system as 
a finite two-dimensional Markov Chain (MC). When the MC 
is in state e = TX and RC have and energy 

quanta stored in their batteries, respectively. In every state of 
the MC, a decision is made on the transmission power p(e) G 
5 of TX and on how many energy quanta RC transfers to TX, 
namely d{e) e {0,1,..., 

Eollowing the approach of | |46| , pS) , | |49) , in this paper 
we only consider deterministic policies. Therefore, an online 
policy g specihes a mapping between the current state of the 
system, e, and the corresponding action (transmitted power 
p{e) and transferred energy d{e)), i.e., g = {(p(e), c?(e)), Ve}. 
Through an online policy g, a specihc sequence of energy 
arrivals can be simply mapped to a sequence of actions (P, D). 

The batteries evolution can be rewritten in terms 

of energy quanta where, instead of (SDk and qfPk), we use 
1(3Dk\ and q^iPk) — r9’(p(®))l’ respectively. This choice 
will result in a lower bound for the real performance (however, 
we verihed that the upper bound obtained by using \(3Dk~\ and 
[g'(p(e))J is very similar). 

We restrict our study to the set of feasible policies, i.e., 
those in which, for every e, we have p{e) > 0, d{e) > 0, 
qTip{e)) < e‘^, q^ipie)) -t- d{e) < 

The reward of Equation 0 does not depend upon the 
starting state when the underlying MC has a unique recurrent 
class 15§. Under this assumption, the long-term reward can 
be rewritten as 

6*’= e" 

max max 

Gr^= 7^-7(e)5(p(e)), (20) 

e *^=0 £'<==0 

where 7r^(e) is the steady-state probability of being in state 
e under policy g. The optimization variables of Problem (|7]) 
become (p(e),d(e)),Ve and the maximization is performed 
over all the feasible policies. 

The Optimal Online Policy g* (OP-ON) that maximizes Grj 
can be found numerically with the Policy Iteration Algorithm 
(PIA) 0 by exploiting the full energy arrivals statistics. The 
algorithm starts with an initial policy (thus we arbitrarily ini¬ 
tialize p(e) and d(e)) and then performs the policy evaluation 
and improvement steps in order to iteratively hnd a new policy, 
until the reward function Gr^ converges (for additional details 
see Section. 7.2]). 

A. Low Complexity Policies 

In addition to the optimal online policy OP-ON, here we 
also introduce some simple heuristic policies, that will be used 


^Note that for any realistic system, in which ^rc—>tx ^ ^ ^nd 
under the optimal policy we must have £)t’'->rc£)rc—»tx _ ^e., transferring 

non-zero energy in both directions simultaneously is strictly sub-optimal. 


'^The accuracy of the discrete approximation of the continuous case can 
always be improved by using a finer quantization, which however results in 
a model with more states and therefore higher complexity. 



in the numerical evaluations in Sections |Vl] and IVIII to show 
that, even when sub-optimal policies are adopted, the system 
reward can be improved using ET. 

In previous works, we studied sub-optimal low complex¬ 
ity policies for EHDs in several cases pT) , ph) . However, 
when EH is combined with ET, the structure of the optimal 
policy is complex and, moreover, depends upon the energy 
ari'ival processes and the energy consumption functions. Eor 
these reasons, it is difficult to introduce a simple policy that 
approximates the optimal one in a broad range of values and 
so the approaches of |2T), cannot be directly applied. 

In the general case, we dehne the Greedy Policy (GP) as 
follow^ 

p(e) = 

d{e) = - q]^{p{e)). 


GP is a simple policy that empties at least one battery in 
every slot and is independent of the energy arrivals. Consider 
now the case where both TX and RC have (;'(•) = '!''(■)■ 
We introduce two other policies, namely BP and LCP, as 
extensions of GP. 

The Balanced Policy (BP) is dehned as the solution of the 
following system (note that BP does not depend upon the 
energy arrival statistics, a useful feature when the harvesting 
process is unknown) 


'El^ + pDk - q^^iPk) = Ef - Dk - q’^^Pk), 
Pk = min ), (E;- - L»fc)} ■ 


( 22 ) 


Instead, the Low Complexity Policy (LCP) is dehned as 
follows 


p(e) = min{gr '(e*’'), 

d{e) = minle"'^ - q][[p{e)), - 


(23) 


where [•] = Round{-). 

In order to explain how to derive BP according to the 
above dehnition, we neglect the floor and ceiling operations 
that should be considered in the battery update formulas in 
the discrete model. At the end of slot k, neglecting outage 
and overhow, the energy levels of the two devices are; 

+ El^ + flDk - q^^{Pk) and -Dk- q^^{Pk). 

We impose that at the beginning of the next slot these two 
quantities be equal. Note that and BfL are not known a 
priorij^thus we neglect them as well (it is possible to include 
only the means of the energy arrivals, but we verihed that 
this rehnement would not provide any signihcant beneht). 
Also, since we need to specify both Pk and Dk, we need an 
additional equation. We impose that one of the two batteries is 

^Note that, differently from the function g^(-) may not be bijective. 

In this context we define ^(x) = maxp.^i p, i.e., q'^ ^(i) is 

the greatest element of S that is mapped to x. This is a reasonable choice 
because, for all values of P such that q'^{P) = x, the energy consumption 
is the same but the reward g{P) is different and, since g(P) increases with 
P, we choose the greatest value in order to obtain the highest reward. 

*It is possible to relax this hypothesis if the arrival process is predictable 
or partially predictable. 


emptied in every slot, and therefore choose Pk as the minimum 
between ) and q^^~\El^ - Dk). 

Assume that an acceptable solution of ( |22] i exists and name 
it {p,d). Two cases have to be considered; 


1) q*^ {Ef) < q^^ {El? - d) ^ p = q^^ {Ef). In 

this case, the hrst ecjuation can be simplihed and we 


hnd d = 


2 ) 9 * 


(PP) . 


{Ef) > qf {Ef -d)^p = q^‘^ {Ef - d). In 


this case p and d can be numerically found. 


Also, it may happen that the system does not have accept¬ 
able solutions, i.e., p or J is negative or exceeds the current 
battery levels. In this case we proceed as follows. Eirst, we 
substitute the second equation into the hrst one. Then, we 
hnd the solution of the hrst equation, namely d, following the 
previous reasoning, i.e., considering the two possible cases. 
Einally, if d is negative, we set d = 0. Instead, if d > Ef, we 
set d = Ef. p is then derived from the second equation. 

Once (p, d) is specihed, we extract the online policy as 
follows (replace El with ef: p(e) = p and d{e) = [dj. We 
used the floor operation in order to guarantee q^'^{p{e)) + 
d{e) < (with the round operation, the condition might not 
be satished). 

The Balanced Policy (BP), obtained according to the above 
procedure, is designed with the goal to balance the energy 
levels of the two devices. 

The Low Complexity Policy is specihed in ( [2^ . Consider 
the last terms of the two min operations. It can be seen 
that they are the discretized versions of Equations 
(we applied the round operations in order to obtain two 
integer values). Note that the policy in ([T4ll-@ does not 
transmit when the batteries cannot support the use of a power 
g,rc whereas in this case LCP would instead always 

use the maximum transmit power allowed by the status of the 
two batteries, which results in the full discharge of at least one 
of them. Although different from ([T^-([T5ll, LCP can achieve 
optimality in some cases, e.g., in the presence of deterministic 
arrivals. 

LCP is a policy that, except for the min operators, does 
not depend upon the energy status. When the distribution 
has a small standard deviation, then it is expected that LCP 
provides good results and moreover, in the deterministic case, 
it degenerates to an optimal policy. 


V. Offline Optimization 

We now focus on offline optimization. One of the key 
aspects of this approach is that the energy arrival sequence 
is assumed to be known a priori (a statistical knowledge of 
the arrival process is not sufficient). Therefore, we restrict the 
study to the hnite horizon case, considering separately the two 
cases of inhnite and hnite batteries. In this context, the aim 
is to hnd the Optimal Offiine Policy p* (OP-OEE), i.e., the 
sequence of actions (P, D) that maximize Gf (Equation (|^). 
In Section IVIII we will use OP-OEE as a benchmark for the 
online ones in the hnite horizon caseQ 

®In this case, we simply apply to the finite horizon scenario the optimal 
online policy for infinite horizon derived in Section [Tv| 




A. OP-OFF - Infinite Batteries 

We first set up the offline optimization problem Q by 
clearly specifying the constraints that have to be satisfied and 
the optimization variables, in the case where the battery sizes 
are infinite. A formulation for the case with finite batteries 
will be given in the next subsection. 

In this case, the optimization problem in (|7]i can be explicitly 
written as follows (we start with empty batteries): 


K 


k—1 


(24a) 

q^^iPk) < El\ 

k = l,...,K, 

(24b) 

q^^{Pk) + Dk< El^, 

k = l,...,K, 

(24c) 

Pk >0, Dk > 0, 

k = l,...,K, 

(24d) 

Er+, = El--q^^P,)+(3D,+Bl-, k=l,. 




(24e) 

El\, = El^-q^^iP,)- 

Dk + B}f, k — 1,. 




(24f) 

E\^ = El^ = 0. 


(24g) 


Note that the battery evolutions include neither min operations 
(because the batteries are infinite) nor max operations (thanks 
to ( |24b| ) and ( |24c| i). We recall that the energy harvested in slot 
k can be exploited only in a later slot and similarly for the 
energy transferred with ET {jiDif). 

Lemma 1. S = {ti= (P, D) : ( |24b| i — ( |24d| i are satisfied} is 
a convex set. 

Proof: S is a convex set if q^^{Pk) — E}f, + 1?^— 

E}f, —Pk and —D}^ are concave function of [Pi^., D^.) for every 
k = 1,..., K. These conditions are satisfied because q^{Pk) 
are defined as concave functions and the other constraints are 
linear. ■ 

Since the reward function is convex (sum of convex func¬ 
tions) and S' is a convex set, ( |24| ) is a convex problem and can 
be solved using standard optimization techniques. 
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(27c) 


and similar constraints have to be considered for the transmit¬ 
ter. The total number of constraints scales as K^. 

The general expressions for the transmitter and receiver 
constraints can be written in compact form as {i = 1,..., /c 
and fc = 1,..., K) 

k k—1 k—1 

E Qf - E ^ > 1} + E ’ (28) 

j^i j^i j^i 

k k — 1 

Y.{QT + Dj) < > 1} + E (29) 

j^i j^i 

where x{*} is the indicator function. The four cases in © 
can be obtained from ( |29l l for k = 1,2,3,4 (note that there 
are k constraints in each case, obtained for f = 1,..., fc). 

For example, when i — 1 or i = k, the last and the first 
lines of © are obtained, respectively. 

In practice, techniques such as the interior-point algorithm 
or the SQP algorithm can be used to And the optimal solution. 
However, if the time horizon is large, the computational 
time can be long. Moreover, to run the algorithms and 

must be known in advance. Thus, even if the offline 
optimization gives the policy with the highest reward among 
all, in practice it can rarely be used. On the other hand, 
finding the optimal offline policy is still useful, as it makes 
it possible to understand what are the limits of the energy 
transfer mechanism, and can be used as a benchmark for all 
other policies. 


B. OP-OFF - Finite Batteries 

When the battery sizes are finite, the optimization problem 
is the same of Equations ( |24a| )-( |24dl l, with the battery update 
formulas ( |24e[ i-( |24f| replaced by 

El\,=rmrr{El^-q^‘^{Pk)-Dk + Bl^, (25) 

= rmrr{Ef - q^^{P,) + fiD, + Bf, (26) 


The problem can be formulated in a standard form (convex 
function to minimize plus inequality and equality constraints) 
by adding an inequality constraint for every possible condition 
imposed by the min operations. For example, for the receiver, 
the first four inequalities that have to be satisfied are {Q\. = 
qlPk)) 


Qi +Di<Q, + D2< 


max’ 

- g- - Di, 


(27a) 


VI. Numerical Results - Online Optimization 


In this section we present some numerical results for the 
online policies. In order to understand their properties, here 
we consider some analytical examples in the infinite horizon 


case. In Section VII we discuss how these policies can be 
applied to a realistic scenario, with finite horizon and real 
data. 

In addition to studying the optimal policy OP-ON, we 
present the performance of sub-optimal policies in several 
settings. We remark that, since we focus on the online case, 
all energies are expressed in terms of energy quanta. 

We consider the long-term maximization of Gn (Equa¬ 
tion ( |20| l or, equivalently, (|^) when the reward function is 
the transmission rate g{x) = ln(l -|-Ax), where A is a scaling 
factor. 7]* is the optimal policy obtained when ET is used, 
whereas ryj is the optimal policy without ET. 
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Figure 2: Energy consumptions q^^{P) and as a function of P for 

several values of A. 



Figure 3: Steady-state probabilities (101og2Q(-) scale) without (left) and with 
(right) EX as a function of the batteries energy status 


The numerical results strongly depend upon the system 
parameters, and on the structure of g{-) and q\-). In the 
following we focus on a particular energy consumption model, 
but similar considerations can be made in other cases as well. 
Consider the following energy consumption functions (C > 0) 
expressed in energy quanta 

r i^p, if p < p„, 

g-(P) + ^ (30) 

[ + a^'=ln(l + AP), ifP>Pn 

and q^^{P) is piece-wise linear as in Equation ( [T6| ) with 
P„ = 1/100. and are parameters that depend upon the 
considered technology. Both devices have a hxed energy cost 
/ plus a linear or logarithmic curve 

If not otherwise specihed, we consider Cmax = — 

^max — 30, truncated geometric arrivals with parameters = 
2, &max = 5 for TX, uniform energy arrivals with parameters 
¥<= = 12.5, = 25 for RC, C = 7, 4, A = 0.1, 

fi = 0.15, a unit slot length and pmax = e„iax (in a slot, 
potentially, all the stored energy can be consumed). 

In Figure the bold curve represents the energy consump¬ 
tion ( 3 '‘( ) considered in this example. Note that in the online 
optimization we consider described in 

Section |IV] 

We dehne the following functions exploiting the technique 

'®We decided to focus on the case f*’' = for presentation simplicity, 
but this assumption is not restrictive. 


Figure 4: Long-term average transmission rates G ^*, (optimal rewards) 
and corresponding improvement as a function of A when = 30 

and C = 7. 


introduced in Example 


4'‘^(P) 


fAln(l + AP), 

\c + p, 

P'^iPraax) p 
Pmax 


if P < a: — C, 

otherwise, 


(31) 


with praax = Smax — C, X = 20.99 and m = 0.0417. It 
can be verihed that these functions satisfy the hypotheses of 
Theorem and the upper bounds are = 0.0834 and 

G^.b. = 0.1561. 

In Figure we show the steady-state distribution of the 
system state when using the optimal policies with and without 
ET. As expected, when energy transfer is not used, the energy 
levels are highly unbalanced and the receiver is almost always 
in overflow. With energy transfer, instead, the overflow prob¬ 
ability becomes lower. In this case, even in the presence of a 
relatively low efficiency, /3 (85% of the energy sent is wasted), 
energy transfer provides a reward improvement of 78%, see 
Figure Note that the improvement is due to the fact that 
RC can send part of its energy to TX and this is particularly 
effective when RC receives more energy and/or consumes less 
energy than TX. A comparison with the upper bounds shows 
that Grj* > 0.99G^°®"^ and > 0.95Gj/"[^. The reward 
without ET and its upper bound are very close (this happens 
because the batteries are large). Instead, with ET the distance 
from the upper bound is wider because the function 
is distant from and the batteries are not sufficiently 

large. When A G {0.001,1,10} the improvements provided 
by the use of ET become (83,64,45}%, thus the performance 
is signihcantly increased in a wide range of values of A. This 
can be observed in Figure where we plot the rewards with 
and without ET, along with the corresponding improvement, 
dehned as (G^* — G^j) /G^*- ET works better in the low 
SNR regime because g{-) tends to be linear, thus smart energy 
transmission techniques {e.g., delay a transmission in order 
to transmit with more power) do not improve the reward 
signihcantly. 

Figure shows how the two rewards (with and without 
















































Figure 5: Long-term average transmission rates G^*, G^* (optimal rewards) 
and corresponding improvement as a function of when = 30 

and A = 0.1. 


Figure 6: Long-term average transmission rates G^*, G^* (optimal rewards) 
and corresponding improvement as a function of when = 30, 

C = 7 and A = 0.1. 


energy transfer) change as a function of (^. When is very 
high, in both cases the value of the reward is very small in ab¬ 
solute terms (see Figure |^, but the use of energy transfer may 
provide a significant reward improvement in relative terms as 
pointed out by the improvement curve > l.SG,,*). Thus, 
it is better to use Energy Transfer even when is high. Even 
if we present our results for similar results can 

be found in the general case. In particular, if either energy 
consumption decreases, then the reward improvement and 
the reward itself increase (similarly to Eigure and vice- 
versa. 

Also, in Eigure we plot the reward when = 30 

is fixed and changes (a similar curve can be obtained 
switching ej^^x The ET improvement increases 

with the battery size. The abscissa values start from 7 since, 
for '^he reward is zero because of the circuitry costs. 

As an additional interesting example, consider the case ( = 
0, where and The energy 

consumption functions are 

gtx(p) ^ ^ ( 32 ) 

In this case ^'‘(•) = q^{-)- The distances from and 

GJ^^ are 0.25% and 3.3%, respectively. Eor larger batteries 
the upper bound gaps are even smaller. We also computed 
the rewards of policies GP, BP and LCP and we found 
Ggp = 0.88G^*, Gbp = 0.88G^*, Glcp = 0.82G^*, i.e., 
in this particular case, the simpler policies provide almost as 
good a performance as OP, while being significantly faster to 
compute. 

VII. Real Data Analysis 

In this section we want to apply the policies found so 
far to some realistic examples. Since in reality only a finite 
sequence of energy arrivals can be available, we focus on 
the optimization of (Equation (0). If we assume that 
the energy arrivals are known a priori, the offline optimal 
policy (OP-OFE) provides the best reward among all. Instead, 
to compute the online policies, only the statistics of the energy 
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Figure 7: Indoor light energy arrivals as a function of the time of the day. 

arrivals is required. In this section, in addition to discussing the 
benefits of ET, we compare the offline and online approaches. 
As in Section [V] we consider separately the cases of infinite 
and finite batteries. 

A. Infinite Batteries 

Consider a scenario with two devices in two different rooms 
of a building, where energy harvesting is based on indoor light. 

At enhants.ee.columbia.edu, a collection of light energy data 
traces is available^]] The authors took measurements of the 
irradiance in different indoor rooms during an extended period 
of time. We use part of this data in our performance evaluation. 

We assume that TX is located on a bookshelf in an office 
(Setup A) and the receiver in another office (Setup B). The 
receiver, generally, harvests more energy than the transmitter 
because it gets more sunshine. We show in Eigure the 
irradiance arrivals for the two devices (measured on 09 January 
2010). It can be seen that, in this case, RC receives signifi¬ 
cantly more energy than the transmitter, therefore it may be 

"These data were discussed in j52j by Gorlatova et al.. 






































10 11 12 13 14 15 10 11 12 13 14 15 

Time of the day Time of the day 

Figure 8: Policies BP (left) and OP-OFF (right) as a function of the time of 
the day. 
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Figure 9: Battery energy status of BP (left) and OP-OFF (right) as a function 
of the time of the day. 


interesting to use energy transfer to try to balance the system. 
In this setup, the harvested power is at most 113 pW/cm^, 
i.e., very low. In an indoor environment, an ultra low power 
sensor network should be deployed, otherwise the energy costs 
would be too high to be sustained by the renewable energy 
source. Therefore, we assume that the transmitter can choose 
its transmit power to be even lower than 1 mW. In this 
case, it can be verihed that the effects of a hnite battery 
can be neglected (even if a very small battery is used, e.g., 
0.16 J ||52|), thus in this section we can consider infinite 
batteries with no loss of generality. 

Time is divided in slots of 60 s each, and in every slot 
a new {Pk,Dk) is chosen. The maximum energy that can 
anlve in 60 s is 60 s x 113 fiW/cm^ x S cm^ where S 
is the solar panel size (assumed equal for the two devices). 
We compute the reward using g{x) = ln(l + Ax) in a low 
SNR regime (A = 0.002). In order to highlight the system 
behavior, we present the results for q^^{P) = q'^^{P) = P. The 
model can be extended, e.g., using the energy consumption 
model of Equation ( |3^ , which would result in an even better 
improvement because RC would consume less energy. 


We use two approaches to apply ET to the system: 1) online 
low complexity balanced policy (BP), which is very easy to 
compute and can be used in practice, and 2) offline optimal 
policy (OP-OEE) (presented in Section V-A| i. We selected 
1 pW/cm^ as the minimum non negligible power that can 
be harvested. In this case, one energy quantum corresponds to 
the minimum energy that can arrive in 60 s, i.e.. 


1 e.q. = 60 s X 1 gWIcnr x S cm^ 


(33) 


Eigure shows the sent data and energy (expressed in 
energy quanta) for BP and OP-OEE. In Eigure the corre¬ 
sponding energy evolutions are presented. 

BP is designed in order to balance the energy of the 
two devices. Indeed, when the transmitter battery is low, 
(transfer energy from RC to TX) is high, i.e., ET is better 
exploited when the difference between the energy arrivals is 
high. Analytically, it can be verified that in the linear energy 
consumption case, BP degenerates in the following policy: 

-I- 


d{e) = 
pie) = 


e — e 
1 + 0 




- die)}, 


(34) 

(35) 


where (•)+ = max{',0}. 

On the left side of Eigure we depicted and E}0 — Dk 
in order to compare the two arguments of Equation ( [35] l. It 
can be seen that E)0 is always lower than E}0 — Dk, thus 
Equation ( |T5| ) becomes pie) = (indeed the curves of Pk 
and E}0 are the same), i.e., the transmitter battery is emptied 
in every slot. Moreover, note that E}^^ = -f l0Dk\, i.e., 
the status of the transmitter battery is similar to B)^, but higher 
(thanks to energy transfer). 

Instead, OP-OEE chooses the initial values of Pk and Dk 
in order to reach a situation where Pk and Dk can be 
kept constant. This is possible because we consider infinite 
batteries. The resulting battery trends are represented on the 
right side of Eigure Note that Pk and Dk were chosen in 
order to have zero energy stored in the last plus one slot, i.e., 
all the available energy is exploited in the finite horizon of K 
slots. Differently from the previous case, E]0^ is greater than 
E}0 in the central region because TX receives a lot of energy 
and RC transfers its energy to TX. 

Note that, if ET is not employed, an upper bound for the 
performance is given by the minimum between the means of 
{B}A} and {Bj0}, whereas, if ET is used, the upper bound is 
given by Equation ([T^p^ 

BP gives a reward equal to 0.0512, whereas = 0.0528 
(optimal offline reward with ET) and G^j = 0.0411 (optimal 
offline reward without ET). The upper bound with and without 
ET are G®]; = 0.0532 and G^b®"^ = 0.0414. Note that 
G^* = 0 . 99 G;^t ^ 0.99G"°b^T, i.e., OP-OEE is 

very close to but does not achieve the upper bounds even if 
the batteries are infinite and this is because we consider a finite 
time horizon. The reward improvement due to ET is 28%. Note 
that, even though BP is a sub-optimal policy (much simpler 
to compute than OP-OEE) and only has a causal knowledge 
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case 


Theorems^ andcan be reformulated using the temporal means in this 
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Figure 10: Solar energy arrivals as a function of the time of the day. 



Figure 11: Optimal offline and online rewards as a function of — ^max 
when A = 0.1 and /3 G {0.15, 0.50,1.00}. 


of the energy arrivals, its reward Gbp is very close to that of 
the optimal offline policy, G^^*. 


B. Finite Battery Effects 


In the previous section we assumed infinite batteries, which 
is legitimate in the indoor environment we considered. How¬ 
ever, when the solar panel is powered with direct sunlight, it 
is likely that an inappropriate use of the energy may lead to 
battery overflow. At | |5^ , a collection of solar light measure¬ 
ments in several locations over the past years is available and 


in Figure 10 we show the itTadiance measured in Elizabeth 
City on 20 July 2014. The continuous lines represent all 
the measured data. We performed a sampling and considered 
only the points depicted with squares and circles. This is 
in order to perform the offline optimization in a reasonable 
computational time (we recall that with finite batteries the 
number of constraints grows quadratically with the number 
of samples). We considered the same energy arrival profile 
for both transmitter and receiver, but we assumed that the 
transmitter has a solar panel three times smaller than RC 
(in reality, the two devices could also receive different solar 
energy because of their position). We scaled the itTadiance data 
in order to apply an MDP approach to solve the problem: the 
histograms of the two energy arrival profiles were assumed as 
empirical pdfs of the two arrival processes and we found OP- 


ON according to the model of Section IV Since this approach 
is sub-optimal because it assumes i.i.d. energy arrivals, we 
compared it with OP-OFF, that gives the best possible results. 


Figure 11 shows the (simulated) rewards with and without 
ET as a function of the battery sizes. We considered the model 
of Equation ( |^ with A = 0.1, emax = = ^max 

P G {0.15,0.50,1.00}. When Cmax is low, even when /3 = 1, 
ET does not improve the system reward. This is because the 
energy harvesting mechanism manages to fill up both batteries 
almost all the time, thus it is not necessary to exchange 
energy. Instead, when the size of the batteries grows, ET may 
significantly improve the reward: when e^ax = 5 the ratio 
for P G (0.15,0.50,1.00} is (1.12,1.30,1.44}, and 
becomes {1.33,1.91,2.51} when Cmax = 20. 



Figure 12: Rewards as a function of when A = 0.1 and 

/3 = 0.15 for several policies. 


Beyond a certain value of Cmax, the rewards can be observed 
to saturate, thus it is not necessary to use very large batteries to 
achieve high rewards. This is because the effects of outage and 
overflow become negligible. Note that, because of the trans¬ 
mitter energy ariivals, without ET the system reward saturates 
very soon, whereas, with energy transfer, the saturation value 
is only reached for higher Cmax- Note that for P = 0.15 and 
Gmax < 7, Gn* is low and this is due to the discretization 
(L/^tlniaxJ — 0 flmax ^ 7). 

In this example OP-ON and OP-OFF are very close, which 
makes online policies very good candidates for application 
in real scenarios, because they are easier to implement while 
being almost optimal. 

Finally, in Figure we plot OP-OFF, OP-ON, the sub- 
optimal online policies BP and LCP and the upper bounds. 
Note that with the online policies OP-ON may be lower than 
BP (at Cmax = 6 for example). This is because OP-ON is 
optimal in the long-run, thus in a particular realization it 
may turn out to be sub-optimal. OP-OFF increases with e^nax 
and almost reaches the upper bound (which is not achieved 
because the simulation time is finite). The Balanced Policy is 


























































generally better than the Low Complexity Policy, because BP 
operates with the energy levels (see Equation (|2^), whereas 
LCP operates with the average energy arrival statistics (see 
Equation (|23]l). 


VIII. Conclusions 

In this paper we jointly analyzed two mechanisms, namely 
Ambient Energy Harvesting and Wireless Energy Transfer, that 
can be used to improve the network performance. We studied a 
scenario composed of two Energy Harvesting Devices, a trans¬ 
mitter and its receiver, that can exchange energy through an 
Energy Transfer interface. We considered two generic energy 
consumption functions and found performance upper bounds 
with and without ET, showing that, under some assumptions, 
they are achievable. Then we studied the online and offline 
optimization problems. In the first case we modeled the system 
with an MDP, studying numerically the optimal online policy 
and introducing some low complexity policies. Eor the offline 
optimization we set up the optimization problem and showed 
that it is convex. In our numerical evaluations we derived the 
optimal transmission policies, showing that ET can signifi¬ 
cantly improve the system performance and discussing how the 
system behaves as a function of the system parameters. Eor 
example, we noticed that the reward improvement increases 
with the battery sizes and remains high even for large values of 
the circuitry cost. Also, we analyzed two realistic examples of 
indoor and outdoor light radiation, showing the effects of finite 
batteries on the transmission strategies. Possible extensions 
of our work are the exploitation of the predictability and 
correlation of the transmitter and receiver energy sources, and 
consideration of battery imperfections. 


Appendix A 
Proof of Theorem[T] 
The energy harvesting mechanism imposes 
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Using the definitions (|^-(|^ and the hypotheses, we have 
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The relation holds for both TX and RC, thus, since we deal 
with increasing functions, i) is obtained. 

Eor the last point of the theorem we introduce the following 
proposition. 


Proposition 2. If T'‘(P) does not exist, then the battery of 
device i is infinite. 


Proof: We will equivalently show that if the battery size is 
finite, then 'k'(-) always exists. Since the battery is finite, the 
transmission power is bounded by Pmax < oo. The function 
T'*(-) can be chosen as a linear function T'‘(P) = mP where 
m is a slope such that mP < q^{P). Thus, since 
is linear, also its inverse is linear. In this case (■)) 

is concave because g{-) is concave, therefore can be 

correctly defined and therefore always exists. ■ 

Now, assume that both and do not exist. 

This implies that the battery sizes are infinite and in this case 
g{q' (P)) for large P increases faster than P (otherwise 
T''(P) can be found). To show that the reward tends to infinity, 
consider the following policy over a time horizon of K slots: 

/K-l 

Pi=P2 = ... = Pk-i = 0, = 

\ fc=i 

The corresponding reward is 

and lim;f_>,oo = oo because the argument of (•) grows 
linearly in K. 
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